A Corpus of Tables in Full-Text Biomedical Research Publications

نویسندگان

Tatyana Shmanina

Ingrid Zukerman

Ai Lee Cheam

Thomas Bochynek

Lawrence Cavedon

چکیده

The development of text mining techniques for biomedical research literature has received increased attention in recent times. However, most of these techniques focus on prose, while much important biomedical data reside in tables. In this paper, we present a corpus created to serve as a gold standard for the development and evaluation of techniques for the automatic extraction of information from biomedical tables. We describe the guidelines used for corpus annotation and the manner in which they were developed. The high inter-annotator agreement achieved on the corpus, and the generic nature of our annotation approach, suggest that the developed guidelines can serve as a general framework for table annotation in biomedical and other scientific domains. The annotated corpus and the guidelines are available at http://www.csse.monash.edu.au/research/umnl/data/index.shtml.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Challenges in Information Extraction from Tables in Biomedical Research Publications: a Dataset Analysis

We present a study of a dataset of tables from biomedical research publications. Our aim is to identify characteristics of biomedical tables that pose challenges for the task of extracting information from tables, and to determine which parts of research papers typically contain information that is useful for this task. Our results indicate that biomedical tables are hard to interpret without t...

متن کامل

BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations

Comprehensive knowledge of genomic variants in a biological context is key for precision medicine. As next-generation sequencing technologies improve, the amount of literature containing genomic variant data, such as new functions or related phenotypes, rapidly increases. Because numerous articles are published every day, it is almost impossible to manually curate all the variant information fr...

متن کامل

Structured digital tables on the Semantic Web: toward a structured digital literature

In parallel to the growth in bioscience databases, biomedical publications have increased exponentially in the past decade. However, the extraction of high-quality information from the corpus of scientific literature has been hampered by the lack of machine-interpretable content, despite text-mining advances. To address this, we propose creating a structured digital table as part of an overall ...

متن کامل

PubRunner: A light-weight framework for updating text mining

Biomedical text mining promises to assist biologists in quickly navigating the combined knowledge in their domain. This would allow improved understanding of the complex interactions within biological systems and faster hypothesis generation. New biomedical research articles are published daily and text mining tools are only as good as the corpus from which they work. Many text mining tools are...

متن کامل

Future competencies for hospital management in developing countries: Systematic review

Background: This was a systematic review presenting the future competencies for hospital managers. Methods: Participants, interventions, comparisons and outcomes (PICO) strategy with MeSH terms were used for searching. Databases used were Web of Science, PsycINFO and Medline, EBSCO, ScienceDirect, Emerald, ProQuest, Social Sciences Research Network, Embase, and some Iranian database su...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

A Corpus of Tables in Full-Text Biomedical Research Publications

نویسندگان

چکیده

منابع مشابه

Challenges in Information Extraction from Tables in Biomedical Research Publications: a Dataset Analysis

BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations

Structured digital tables on the Semantic Web: toward a structured digital literature

PubRunner: A light-weight framework for updating text mining

Future competencies for hospital management in developing countries: Systematic review

عنوان ژورنال:

اشتراک گذاری